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Abstract Let {x l5 X2, • ■ ■ , x n } be a vector of real numbers. An integer relation algorithm 
is a computational scheme to find the n integers a&, if they exist, such that a\X\ + 
^2^2 + • ■ • + a n x n = 0. In the past few years, integer relation algorithms have been 
utilized to discover new results in mathematics and physics. Existing programs for this 
purpose require very large amounts of computer time, due in part to the requirement 
for multiprecision arithmetic, yet are poorly suited for parallel processing. This paper 
presents a new integer relation algorithm designed for parallel computer systems, but as 
a bonus it also gives superior results on single processor systems. Single- and multi-level 
implementations of this algorithm are described, together with performance results on a 
parallel computer system. Several applications of these programs are discussed, including 
some new results in number theory, quantum field theory and chaos theory. 
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1. Introduction 

Let x = (xi,X2, • ■ • ,x n ) be a vector of real numbers, x is said to possess an integer 
relation if there exist integers Oj, not all zero, such that a±xi + 02^2 + • • • + a n x n = 0. By 
an integer relation algorithm, we mean a practical computational scheme that can recover 
(provided the computer implementation has sufficient numeric precision) the vector of 
integers a,, if it exists, or can produce bounds within which no integer relation exists. 

The problem of finding integer relations among a set of real numbers was first studied 
by Euclid, who gave an iterative scheme which, when applied to two real numbers, either 
terminates, yielding an exact relation, or produces an infinite sequence of approximate 
relations. The generalization of this problem for n > 2 was attempted by Euler, Jacobi, 
Poincare, Minkowski, Perron, Brun, Bernstein, among others. The first integer relation 
algorithm with the required properties mentioned above was discovered in 1977 by Fergu- 
son and Forcade [pL8|| . Since then, a number of other integer relation algorithms have been 
discovered, including the "HJLS" algorithm |19] (which is based on the LLL algorithm), 
and the "PSLQ" algorithm. 

2. The PSLQ Algorithm 

The PSLQ integer relation algorithm features excellent numerical stability, and it 
is effective in recovering a relation when the input is known to only limited precision. 
It has been generalized to complex and even quaternion number systems. A detailed 
discussion of the PSLQ algorithm, together with a proof that the algorithm is guaranteed 
to recover a relation in a polynomially bounded number of iterations, is given in |17| . 
The name "PSLQ" derives from its usage of a partial sum of squares vector and a LQ 
(lower-diagonal-orthogonal) matrix factorization. 

A simple statement of the PSLQ algorithm, which is entirely equivalent to the original 
formulation, is as follows: Let x be the n-long input real vector, and let nint denote the 
nearest integer function (for exact half-integer values, define nint to be the integer with 

greater absolute value). Select 7 > y4/3 (the authors use 7 = ^4/3). Then perform the 
following operations: 

Initialize: 

1. For j := 1 to n: for i :— 1 to n: if i = j then set Ay := 1 and B^ := 1 else set 

:= and Bij := 0; endfor; endfor. 

2. For k :— 1 to n: set s k ■— \Jj2j=k x % endfor. Set t = l/s\. For k :— 1 to n: set 
y k :=tx k ; s k := ts k ; endfor. 

3. Initial H: For j := 1 to n — 1: for % := 1 to j — 1: set Hij := 0; endfor; set 
Hjj := Sj + i/sj] for i :— j + 1 to n: set := — ^^/(sjSj+i); endfor; endfor. 

4. Reduce H: For % :— 2 to n: for j := i — 1 to 1 step —1: set t := nint(i/jj / Hjj); and 
Vj := Vj + f° r k := 1 to j: set H ik := H ik — tHj k ; endfor; for k := 1 to n: set 
A ik := A ik — tAj k and B k j := B k j + tB ki ; endfor; endfor; endfor. 

Iteration: Repeat the following steps until precision has been exhausted or a relation has 
been detected. 
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1. 



2. Exchange the entries of y indexed m and m + 1, the corresponding rows of A and 
if, and the corresponding columns of B. 



3. Remove corner on H diagonal: If m < n — 2 then set t := JH% lTn + H m 



ti := H mm /t and t 2 := H m>m+ i/t ; for i := m to n: set t 3 := # im , t 4 := H itm+1 , 
Him '■= tit 3 + £ 2 ^4 and H^ m+ \ := —^3 + M4; endfor; endif. 

4. Reduce H: For z := m + 1 to n: for j := min(i — l,m + 1) to 1 step —1: set 
t := mnt(Hij/Hjj) and ?/,- := j/j +tyi] for fc := 1 to j: set ifj*, := #ifc — tHjj.; endfor; 
for := 1 to n: set := Ajfc — tAjjfe and Bkj '■= Bkj + tBki] endfor; endfor; endfor. 

5. Norm bound: Compute M :— 1/ max^ Then there can exist no relation vector 
whose Euclidean norm is less than M. 

6. Termination test: If the largest entry of A exceeds the level of numeric precision 
used, then precision is exhausted. If the smallest entry of the y vector is less than 
the detection threshold (see below), a relation has been detected and is given in the 
corresponding column of B. 

It should be emphasized that for almost all applications of an integer relation algorithm 
such as PSLQ, very high precision arithmetic must be used. Only a very small class of 
relations can be recovered reliably with the 64-bit IEEE floating-point arithmetic that is 
available on current computer systems. In general, if one wishes to recover a relation of 
length n, with coefficients of maximum size d digits, then it follows by an information 
theory argument that the input vector x must be specified to at least nd digits, and one 
must employ floating-point arithmetic accurate to at least nd digits. Practical integer 
relation programs always require greater precision than this bound. In fact, the difference 
between the level of precision required for a given problem and the information theory 
bound is a key figure of merit for integer relation algorithms. PSLQ is very efficient in 
this regard — for most problems, PSLQ programs can reliably recover relations with only 
about 15% more digits of precision than the information theory bound. 

The software products Maple and Mathematica include multiple precision arithmetic 
facilities. One may also use any of several freeware multiprecision software packages, 
such as the MPFUN package (Fortran-77 and Fortran-90 versions are available), which 
was developed by the first author []1|, 0, and the C/C++ version of MPFUN, which was 
recently developed by Sid Chatterjee and Hermann Harjono of the University of North 
Carolina . The two MPFUN packages permit one to write a program in conventional 
Fortran-77/90 or C/C++, respectively, identifying some or all of the variables to be 
multiple precision (integer, real or complex). Then in expressions where these variables 
appear, the appropriate multiple precision routines are automatically referenced, thus 
saving considerable programming effort. 

In the course of the operation of the PSLQ algorithm on a real computer system, the 
entries of the y vector gradually decrease in size, with the largest and smallest entries 
usually differing by no more than two or three orders of magnitude. When a relation is 
detected by the algorithm, the smallest entry of the y vector abruptly decreases to roughly 
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the multiprecision "epsilon" (i.e. 10 _p , where p > nd is the precision level in digits). 
The detection threshold in the termination test (iteration step 6) above is typically set 
to be a few orders of magnitude greater than the epsilon value, in order to allow for 
reliable relation detection in the presence of some numerical round-off error. The ratio 
between the smallest and the largest y entry when a relation is detected can be taken as 
a "confidence level" that the relation is a true relation and not an artifact of insufficient 
numeric precision. Very small ratios at detection, such as 10~ 100 , almost certainly denote 
a true relation. 

As shown in [O, the PSLQ algorithm is guaranteed to find relations in a bounded 
number of iterations. However, this result is based on the assumption of perfect, infinite- 
precision arithmetic. In an implementation on a real computer system, one can never rule 
out hardware, software and programming errors, although the chances of these errors can 
be minimized by independent computations. Also, PSLQ programs utilize multiprecision 
software with finite working precision, and they make decisions based on numerical toler- 
ances. Thus it is possible that numerical anomalies can result, although these anomalies 
generally can be remedied by using higher precision. 

3. Some Applications of the PSLQ Algorithm 

One application of PSLQ in the field of mathematical number theory is to determine 
whether or not a given constant a, whose value can be computed to high precision, is 
algebraic of some degree n or less. This can be done by first computing the vector 
x = (1, a, a 2 , ■ ■ ■ , a n ) to high precision and then applying an integer relation algorithm. 
If a relation is found for x, then this relation vector is precisely the set of integer coefficients 
of a polynomial satisfied by a. If a relation is not found, the maximum bound determined 
by PSLQ means that a cannot be the root of an polynomial of degree less than or equal 
to n, with integer coefficients whose size (Euclidean norm) is less than the established 
bound. For example, it is well known |7[ that 

OO 1 

c(2) ■ 'Ssgy 

c(3) = 2 £ 

> / i \ 36^ 1 



These results have led some to suggest that 

oo ( -\\k— 1 



* - C(5)/ S 

might also be a simple rational or algebraic number. Computations using the PSLQ 
algorithm have established that if Z 5 satisfies a polynomial of degree 25 or less, then 
the Euclidean norm of the coefficients must exceed 2 x 10 37 . Results such as this strongly 
suggest that the constants ((n) for n > 4 are not given by simple one-term formulas 
as above. Indeed, this "negative" result was fruitful in that it led to the discovery of 



multi-term identities for such sums [10]. An example will be given in section 8. 
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One of the first "positive" results of this sort was the identification of the constant 
B 3 = 3.54409035955 •••[[[]]. B 3 is the third bifurcation point of the logistic map Xk+i = 
rxk{l — Xk), which exhibits period doubling shortly before the onset of chaos. To be 
precise, B 3 is the smallest value of the parameter r such that successive iterates Xk exhibit 
eight-way periodicity instead of four-way periodicity. Computations using a predecessor 
algorithm to PSLQ found that B 3 is a root the polynomial 4913 + 2108t 2 -604t 3 -977t 4 + 
St 5 + Ut 6 + 392t 7 - 193t 8 - 40t 9 + 48t 10 - 12t u + t 12 . A stronger result will be given in 
section 8. 

A large number of results were recently found using PSLQ in the course of research on 
multiple sums, such as those evaluated in Table 1. After computing the numerical values 
of many of these constants, a PSLQ program was used to determine if a given constant 
satisfied an identity of a conjectured form. These efforts produced numerous empirical 
evaluations and suggested general results |J. Eventually, elegant proofs were found for 
many of these specific and general results and ||. Three examples of identities that 
are now proven are given in Table 1. In the table, ((t) = X^Lii~* is the Riemann zeta 
function, and Li„(x) = X^i^J ™ denotes the polylogarithm function. 

£?=i (i + £ + ■■■ + i) 2 (* + 1)" 4 = 4^ - C 2 (3) 

£?=i (l + \ + ■ ■ ■ + i) 3 (* + I)" 6 = C 3 (3) + WC(9) + |vr 2 C(7) 

- #)- 4 C(5) - ^ A(3) 
Efeli (l - | + ■ • • + (-l) k+1 l) 2 (k + l)- 3 = 4Li 5 (§) - £ ln 5 (2) - gC(5) 

- ^tt 4 ln(2) + JC(3) ln 2 (2) + ^ ln 3 (2) - |tt 2 C(3) 



Table 1: Specimen evaluations, found with PSLQ and now proven 

It has been found that there is an intimate connection between such multiple sums 
and the constants resulting from evaluation of Feynman diagrams in quantum field theory 
fj~T| , |I~2|j . In particular, the renormalization procedure (which removes infinities from the 
perturbation expansion) entails multiple zeta values defined by || 

v ! 



C(S1, 8 2 , ■ ■ ■ , 8 r ) - ^ ,, s ., 

k!>k2>->kr>0 K l K 2 ' ' 



The ( notation is used in analogy with Riemann's zeta function. The PSLQ algorithm 
was used to find formulas and identities involving these constants. Again, a fruitful theory 
emerged, including a large number of both specific and general results || ||. 

Some recent quantum field theory results using PSLQ are even more remarkable. For 
example, it has now been shown |13| that in each of ten cases with unit or zero mass, 
the finite part the scalar 3-loop tetrahedral vacuum Feynman diagram reduces to 4-letter 
"words" that represent iterated integrals in an alphabet of 7 "letters" comprising the 
one-forms Q := dx/x and Uk '■= dx/{\ — x), where A := (1 + \f—2>)/2 is the primitive 
sixth root of unity, and k runs from to 5. A 4-letter word is a 4-dimensional iterated 
integral, such as 
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u 



A/o2 x f 1 dx i f xi dx 2 f X2 dx 3 f x 3 dx 4 

C(n lo 3 u ) = — / — / — / 

JO X 1 JO X 2 Jo ( — l—X 3 )Jo (1 — X4) 



j>k>0 J 

There are 7 4 four-letter words. Only two of these are primitive terms occurring in the 
3-loop Feynman diagrams: U, above, and 



V := Real[C(fVWi)] = £ 

j>fc>0 



(-l}>'cos(27rA;/3) 
fk 



The remaining terms in the diagrams reduce to products of constants found in Feynman 
diagrams with fewer loops. These ten shown in Figure 1. In these diagrams, 

dots indicate particles with nonzero rest mass. The formulas that have been found for 
the corresponding constants are given in Table 2. The constant C = Y,k>o s ^ n ( n ^/^)/^ 2 - 




V\a V4JV V 5 Vq 

Figure 1: The ten tetrahedral cases 
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Table 2: Evaluations of the 10 constants corresponding to the 10 cases in Figure 1 
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4. A New Formula for Pi 

Through the centuries mathematicians have assumed that there is no shortcut to 
computing just the n-th digit of n. Thus, it came as no small surprise when such an 
algorithm was recently discovered M. In particular, this simple scheme allows one to 
compute the n-th hexadecimal (or binary) digit of it without computing any of the first 
n — 1 digits, without using multiple-precision arithmetic software, and at the expense of 
very little computer memory. The one millionth hex digit of n can be computed in this 
manner on a current-generation personal computer in only about 60 seconds run time. 

This scheme is based on the following new formula, which was discovered using PSLQ: 



7T 



E 

fc=0 



IQk 



1 



8k + 1 8k + 4 8k + 5 8k + 6 



It is likely the first instance in history of a significant new formula for tt discovered by 
computer. Further base-2 results are given in 0, 14[]. In fl3| base-3 results were obtained, 
including 



7T 



2 ~ 1 
27 729 k 
72 



243 



(12k + l) 2 
9 



405 



81 



27 



(12k 



2) 2 
9 



l2fc + 4) 2 
5 



(12A; + 5) 2 



l2A; + 6)2 (12fc + 7) 2 (12k + 8) 2 (12k + 10) 2 (12*; + ll) 2 



5. Multi-Level Implementations of PSLQ 

In spite of the relative efficiency of PSLQ compared to the other algorithms in the 
literature, computer run times of programs that straightforwardly implement the PSLQ 
algorithm are typically quite long. Even modest-sized problems can require many hours 
for solution on a current personal computer or workstation. This is mainly due to the cost 
of using high precision arithmetic software for nearly every operation in the algorithm. 

As it turns out, it is possible to perform most, if not all, of the PSLQ iterations us- 
ing ordinary 64-bit computer arithmetic, with only occasional recourse to multiprecision 
arithmetic. In this way, run times can be dramatically reduced. Here is a sketch of this 
scheme, which will be referred to as a "two-level" implementation of the PSLQ algorithm. 
In the following, "double precision" means the 64-bit IEEE hardware arithmetic avail- 
able on most current computer systems, and y, A, B and H denotes double precision 
counterparts to the arrays y, A, B and H in the PSLQ algorithm. 

First, perform the multiprecision initialization steps of PSLQ as given in section 2 
above. Then perform a double precision "re-initialization" step: set A and B to the n x n 
identity matrix; set y to the best double precision approximation of the current y vector, 
multiplied by a scale factor so that its largest entry is unity; and set H to the best double 
precision approximation of the current H matrix. For some extremely large problems it 
may be necessary to scale the H matrix to avoid numeric overflow. Then perform an 
LQ (lower-diagonal-orthogonal) matrix factorization on H, and replace H by the lower 
diagonal portion of the result (the upper right portion is zeroed). The subroutine DQRDC 
of the Linpack library [TIJ may be employed for this factorization, provided both the input 
and output matrices are transposed. 
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Next, perform PSLQ iterations using the double precision arrays. In the course of 
these iterations, the entries of A and B (which contain integer values, although stored as 
IEEE double precision data), steadily increase in size. Monitor the entries of these ma- 
trices as they are updated, and when any entry reaches a certain threshold (the authors 
use 10 13 ), or when the smallest y entry becomes smaller than a certain threshold (the au- 
thors use 10~ 14 ), then update the multiprecision arrays by means of matrix multiplication 
operations, as follows: 



V 


■= V 


B 


B 


:= B 


■ B 


A 


:= A 


■A 


H 


:= A 


■H 



After these updates are performed, the entries of the A matrix and the y vector are 
checked, as in the termination test (iteration step 6) of PSLQ, and a norm bound is 
computed. If neither of the termination conditions holds, then the double precision arrays 
are re-initialized again as mentioned above, another set of double precision iterations are 
performed, and the process continues. 

This general scheme works well for many problems, but there are several difficulties 
that must be dealt with in a fully robust implementation. One difficulty is that at some 
point in the computation (typically at the very beginning), the y vector may have a 
dynamic range that exceeds the range (11 or 12 orders of magnitude) that can be safely 
handled using double precision iterations. Another difficulty is that occasionally an entry 
is produced in the A or B matrix that exceeds the largest whole number (2 53 = 9.007 • • • x 
10 15 ) that can be exactly represented as 64-bit IEEE data. A straightforward solution 
when such a condition occurs is to abandon the current iteration, restore a previous 
iteration's values of y, A, B and H, update the multiprecision arrays as above, perform 
an LQ matrix factorization on the H matrix, and then perform iterations using full 
multiprecision arithmetic until these special conditions no longer hold. 

A more efficient solution for large problems that require very high precision is to 
employ "intermediate precision", in other words a fixed level of precision (the authors 
use 125 digits) that is intermediate between double precision and full multiprecision. 
Updating the full multiprecision arrays from the intermediate precision arrays is done 
with matrix multiplication operations in a manner precisely analogous to that described 
above. Incorporating intermediate precision in this manner gives rise to what we will refer 
to as a "three-level" implementation of PSLQ. 

One additional improvement that can be made to each of these schemes is to omit 
multiprecision computation of the A matrix (although the double precision and interme- 
diate precision equivalents of A must be computed). The multiprecision A matrix (which 
is the inverse of the B matrix) is used in the PSLQ algorithm only to determine when ex- 
ecution must be halted due to the exhaustion of numeric precision. However, exhaustion 
of numeric precision can alternatively be handled by halting iterations when the smallest 
y entry is sufficiently close to the multiprecision epsilon level (the authors use a factor of 
10 25 ). 

These three PSLQ schemes (one-level, two-level and three-level) have been imple- 
mented by the first author, using the Fortran-90 MPFUN software 0. Some perfor- 
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Table 3: Run times for the three PSLQ programs 

mance results are shown in Table 3 for a class of problems. Here r, s define the constant 
a = 3 1 / 7, — 2 1 ^ s , which is algebraic of degree rs, and n = rs + 1. The n-long vector 
of coefficients of the polynomial satisfied by a can thus be obtained by using a PSLQ 
program, as explained in section 3. The column headed "Iterations" gives the number 
of PSLQ iterations required for solution, while "Digits" gives the working precision level 
used, in decimal digits. "Time" gives CPU time in seconds for runs on a single processor 
of an SGI Origin-2000 system with 195 MHz R10000 CPUs. 

It can be seen from these results that the two-level PSLQ program is up to 65 times 
faster than the one-level program, yet it finds relations just as well, usually in exactly the 
same course of iterations as the one-level program. The three-level program is faster than 
the two-level program for large problems, even though the special conditions mentioned 
above rarely arise in the particular problems mentioned in the table. The reason for this 
fortunate circumstance appears to be improved data locality in the three-level scheme, 
which is advantageous on modern cache-based computer systems. Fully detailed computer 
programs are available from the authors at the web site [http : //www . nersc . gov/~dhb| . 

6. The Multi-Pair Algorithm 

Even with the substantial accelerations described in the previous section, run times 
are painfully long for some very large problems of current interest in mathematics and 
physics. Thus one is led to consider employing highly parallel supercomputers, which have 
the potential of performance hundreds of times faster than for single-processor scientific 
workstations and personal computers. 

Unfortunately, the standard PSLQ algorithm appears singularly unsuited for modern 
parallel computer systems, which require high levels of coarse-grained concurrency. The 
main difficulty is that large integer relation problems often require over one million PSLQ 
iterations, each of which must be completed before the next begins. Further, within an 
individual iteration, the key reduction operation (iteration step 4) has a recursion that 
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inhibits any possibility for parallel execution, except at the innermost loop level. These 
considerations have led some researchers in the field to conclude that there is no hope for 
any significant parallel acceleration of PSLQ-type computations. 

But it turns out that a variant of the PSLQ algorithm can be formulated that dra- 
matically reduces the number of sequential iterations that must be performed, while at 
the same time exhibiting reasonably high concurrency in the major steps of individual 
iterations. To that end, consider the following algorithm, which will be referred to as the 
"multi-pair" variant of PSLQ. Here 7 = ^4/3 as before, and (5 = 0.4. 

Initialize: 

1. For j := 1 to n: for i :— 1 to n: if i = j then set : = 1 and : = 1 else set 
Aij : = and := 0; endfor; endfor. 



2. For k :— 1 to n: set s k :— \JYJj=k x2 j'-> endfor; set t = 1/si; for k :— 1 to n: set 
y k :=tx k ; s k : = ts k ; endfor. 

3. Initial H: For j :— 1 to n — 1: for % :— 1 to j — 1: set H^ := 0; endfor; set 
Hjj := Sj + i/sj] for i :— j + 1 to n: set := — yii/j/ (sjSj + i); endfor; endfor. 

Iteration: Repeat the following steps until precision has been exhausted or a relation has 
been detected. 

1. Sort the entries of the (n — l)-long vector {7*|ifjj|} in decreasing order, producing 
the sort indices. 

2. Beginning at the sort index m x corresponding to the largest Y\Hu\, select pairs of 
indices (rrii,mi + 1), where rrii is the sort index. If at any step either m 8 or rrii + 1 
has already been selected, pass to the next index in the list. Continue until either 
f3n pairs have been selected, or the list is exhausted. Let p denote the number of 
pairs actually selected in this manner. 

3. For i :— 1 to p, exchange the entries of y indexed rrii and + 1, and the corre- 
sponding rows of A, B and H; endfor. 

4. Remove corners on H diagonal: For % :— 1 to p: if rrii < n — 2 then set to : = 
V ' H mi,mi + H mi,mi+i, h ■= H mum ./t and t 2 := H mum . +1 /t ; for i := m 8 to n: set 
^3 := H irn .; t 4 := H ijm . +1 ] H im . := tit 3 + t 2 ^4; and H i)m . + i := —t 2 t 3 + tit 4 ; endfor; 
endif; endfor. 

5. Reduce H: For % :— 2 to n: for j :— 1 to n — % + 1: set / := i + j — 1; for 
k :— j + 1 to I — 1: set := — Ti k H k f, endfor; set Tij := mnt(Hij/Hjj) and 
Hij := Hij — TijHjj] endfor; endfor. 

6. Update y: For j :— 1 to n — 1: for i :— j + 1 to n: set yj := yj + T^yf, endfor; 
endfor. 

7. Update A and B: For k :— 1 to n: for j :— 1 to n — 1: for % :— j + 1 to n: set 
A ik := A ik — TijAj k and Bj k := Bj k + TijB ik ; endfor; endfor; endfor. 
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8. Norm bound: Compute M :— 1/ max,,- \Hjj\. Then there can exist no relation vector 
whose Euclidean norm is less than M. 

9. Termination test: If the largest entry of A exceeds the level of numeric precision 
used, then precision is exhausted. If the smallest entry of the y vector is less than 
the detection threshold (see section 2), a relation has been detected and is given in 
the corresponding row of B. 

There are several differences between this algorithm and the standard one-level PSLQ 
algorithm: (1) there is no reduction step in the initialization; (2) the B matrix is trans- 
posed from the standard PSLQ algorithm; (3) up to (3n disjoint pairs (not just a single 
pair) of adjacent indices are selected in each iteration; (4) the H reduction loop proceeds 
along successive lower diagonals of the H matrix; (5) a T matrix is employed, which 
contains the t multipliers of the standard PSLQ; and (6) the y, A and B arrays are not 
updated with H, but in separate loops. 

Since the multi-pair algorithm maintains the H matrix in lower triangular form, and 
the A and B matrices are maintained as integer matrices, the norm bound stated in 
iteration step 8 above is valid, by the same argument that applies to the original PSLQ 
algorithm (jTjJ . 

Unfortunately, we cannot offer a proof that the multi-pair algorithm is guaranteed 
to recover a relation in a bounded number of iterations, as can be done with PSLQ. In 
fact, it has been found that for certain special problems, the multi-pair algorithm, as 
stated above, falls into a repeating cycle, with a period of (usually) two iterations. Our 
implementation deals with this difficulty by comparing the y vector at the end of each 
iteration with saved copies from eight previous iterations, and if a duplication is found, 
then only one pair of indices is selected in step 2 of the next iteration (so that the next 
iteration is equivalent to a standard PSLQ iteration). It should be added, however, that 
these repeating situations are extremely rare in nontrivial problems. We have not seen 
any instances of such repeats when n > 20. 

On the positive side, we have found, based on our experience with a wide variety of 
sample problems, that the norm bound increases much more rapidly than in the standard 
PSLQ. Indeed, it appears that the selection of up to (3n disjoint pairs of indices in step 2 
above has the effect of reducing the iteration count by nearly the factor (3n. This results 
in a significant saving in the number of expensive H reduction and array update steps. 
More importantly, without this dramatic reduction in the sequential iteration count, an 
efficient parallel implementation would not be possible. Parallel issues will be discussed 
in greater detail in the next section. 

Given that the multi-level implementations of PSLQ are so much faster than the 
standard one-level PSLQ, one might also wonder whether there exist analogous multi-level 
implementations of the multi-pair algorithm. Happily, the multi-level scheme sketched in 
section 5 can be adopted almost without change. One change that is required is that the 
multiprecision arrays are updated as follows: 



y 


:= B 


y 


B 


:= B 


B 


A 


:= A 


A 
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H := A-H 



Note that y and B are updated here in the same manner as the A and H arrays. This 
change stems from the fact that the B matrix in the multi-pair scheme is transposed from 
the B matrix in the standard PSLQ algorithm. 

The multi-pair algorithm and the multi-level implementations described here were all 
devised to permit parallel processing. But it turns out that these programs also run faster 
on a single processor system, compared with the standard PSLQ equivalents. Some one- 
processor timings are shown in Table 4 for the suite of test problems used in Table 3. 
Note for example that the one-level multi-pair program is up to twice as fast as the one- 
level PSLQ program, and the three-level multi-pair program is up to 22% faster than the 
three-level PSLQ program. Note also that the iteration counts are reduced by a factor of 
up to 34. Finally, note that the multi-pair schemes require slightly less numeric precision 
for solution than their PSLQ counterparts. The reason for this unanticipated benefit is 
not known. 









One 


-level 


Two-level 


Three-level 


r, s 


n 


Iterations 


Digits 


Time 


Digits 


Time 


Digits 


Time 


5,5 


26 


558 


180 


26.08 


180 


1.48 






5,6 


31 


840 


230 


70.71 


240 


3.43 






6,6 


37 


1136 


310 


189.27 


310 


7.84 






6,7 


43 


1625 


400 


479.07 


410 


17.22 






7,7 


50 


2071 


500 


1130.85 


500 


35.64 






7,8 


57 


2410 






660 


69.39 






8,8 


65 


3723 






800 


169.62 


880 


214.66 


8,9 


73 


4943 






1010 


358.07 


1100 


427.29 


9,9 


82 


6169 






1260 


744.20 


1320 


804.51 


9,10 


91 


7850 






1560 


1556.37 


1600 


1450.29 


10,10 


101 


10017 






1890 


3283.08 


1950 


2747.12 



Table 4: Run times for the three multi-pair programs 



7. Parallel Implementations of the Multi-Pair Algorithm 

The key steps of the multi-pair iterations are all suitable for parallel execution. First 
note that the p row exchanges in iteration step 3, as well as the p corner removal operations 
in step 4, can be performed concurrently, since the p pairs of indices (m^mj + 1) are all 
disjoint. Secondly, the reorganized H matrix reduction step (step 5), which is equivalent 
to the H matrix reduction scheme in the standard PSLQ, may be performed concurrently 
at the second loop level, instead of only at the innermost loop level as in standard PSLQ. 
The update of the A and B arrays (step 7) is even more favorable to parallel processing: 
this loop may be performed concurrently at the outermost loop level. The change in the 
B matrix, which is transposed from the standard PSLQ algorithm, is favorable for an 
implementation on a distributed memory parallel computer. 
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The two- and three-level multi-pair schemes are also well suited for parallel computa- 
tion. This is because the dominant cost of these programs is the matrix multiplication 
operations involved in the multiprecision array updates, and these matrix multiplica- 
tions can be performed concurrently at the outermost loop level. The parallel techniques 
mentioned in the previous paragraph can still be applied to the double precision and in- 
termediate precision iterations. It turns out, though, that the double precision iterations 
run so rapidly that parallel processing of these iterations is often not worth the overhead. 
Nonetheless, we have achieved modest acceleration on very large problems by using par- 
allel processing on some steps of double precision iterations. Some parallel performance 
results will be given in the next section. 

8. Large Applications and Parallel Performance 

Three recent applications will be described here, each of which involves very large inte- 
ger relation problems. Thus they are excellent test cases for the new multi-pair programs. 

Reduction of Euler sums: In section 3, we mentioned recent research on multiple zeta 
values, which play a key role in quantum field theory |[T^] . More generally, one may define 
Euler sums by [0] 



c 



where <jj = ±1 are signs and Sj > are integers. When all the signs are positive, one 
has a multiple zeta value. Constants with alternating signs appear in problems such as 
computation of the magnetic moment of the electron. 

It was conjectured by the second author that the dimension of the space of Euler sums 
with weight w := J2j s j is the Fibonacci number F w+1 = F w + F w _x, with Fx — F 2 — 1. 
Complete reductions of all Euler sums to a basis of size F w+ i have been obtained with 
PSLQ at weights w < 9. At weights w = 10 and w = 11 the conjecture has been 
stringently tested by application of PSLQ in more than 600 cases. At weight w — 11 
such tests involve solving integer relations of size n = F i2 + 1 = 145. In a typical case, 
each of the 145 constants was computed to more than 5,000 digit accuracy, and a working 
precision level of 5,000 digits was employed in the three-level multi-pair program. A 
relation was detected at iteration 31,784. The minimum and maximum y vector entries 
at the point of detection were 9.515 x io -4970 and 4.841 x 10~ 4615 , respectively. The ratio of 
these two values (i.e. the "confidence level") is a tiny 1.965 x 10~ 355 . Moreover, the ratio 
of the last two recovered integer coefficients is precisely — 11! = —39916800. Given these 
facts, we can dismiss the possibility that the recovered relation is a spurious numerical 
artifact. 

Bifurcation to a 16-cycle: A second large application that we shall mention here is the 
problem of determining the polynomial satisfied by the constant = 3.564407268705 • • •, 
the fourth bifurcation point of the logistic map x/c+i = rxk{l — %k)- 111 section 3 we noted 
that an 8-cycle begins at r = B 3 , where B 3 satisfies a polynomial equation of degree 12. 
At r = B4, this gives way to 16-cycle. It has been recognized that all B^ are algebraic, 
but nothing has been known about the degrees or the coefficients of the polynomials 
satisfied by these constants for k > 3. Some conjectural reasoning had suggested that 
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B4 might satisfy a 240-degree polynomial, and some further analysis had suggested that 
the constant a = —B^B^ — 2) might satisfy a 120-degree polynomial. In order to test 
this hypothesis, the three- level multi-pair program was applied to the 121-long vector 
(l,«,« 2 ,---,a 120 ). 

In this case the input data was computed to over 10,000 digit accuracy, and a working 
precision of 9,500 digits was employed in the three-level multi-pair program. A relation 
was detected at iteration 56,666. The minimum and maximum y vector entries at the point 
of detection were 1.086 x l(T 9428 and 3.931 x l(T 8889 , which form the ratio 2.763 x 10" 540 . 
Further, the recovered integer coefficients descend monotonically from 257 30 ~ 1.986x 10 72 
to one. Again, these facts argue very strongly against the solution being a spurious 
numerical artifact. 

Reductions to Multiple Clausen Values: As a third application, consider sums of 
the form 



S(k) 



n>o n 1 



1 



with, for example, S(4) = 177r 4 /3240. Researchers have sought analytic evaluations of 
these constants for k > 4. As a result of PSLQ computations, the constants {S(k) \ k = 
5 ... 20} have been evaluated in terms of multiple zeta values and multiple Clausen values 
of the form 
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M(a,b) :-- 



with, for example, 



E 

ni>n,2>...>n b >0 



sin(ni7r/3) 



??.- 



7=1 3 



S(9) 



2M(7,l) + -M(5,3) + -C(2)M(5,i; 



13921 
216 



C(9) 



+ ^C(7)C(2) + ^C(6)C(3) 



331 



486 



648 



C(5)C(4) - -C 3 (3) 



The evaluation of the constant S(20) is a 118-dimensional integer relation problem, which 
required 4800 digit arithmetic. In this case a relation was detected at iteration 27,531. 
The minimum and maximum y vector entry at detection were 7.170 x io~ 4755 and 3.513 x 
10~ 4375 , which gives a confidence ratio of 2.040 x 10 -380 . The actual solution for this 
problem is shown in Table 5. In this table, irreducible multiple zeta values such £(5, 3) : = 
J2j>k>oJ 5 ^ 3 occur. Moreover, there are alternating Euler sums, such as C(9,3) : = 



J2j>k>o(~ l) J 'i -9 (— 1) &~ 3 , where an alternating sign is indicated by a bar. The presence 
of the latter results from another discovery obtained with PSLQ |Tl|], namely that some 
multiple zeta values may be reduced to alternating Euler sums with fewer summations. 
Finally, the combinations 



Ca(o, b, c) := ({a, b, c) + ({a, b, c) + ({a, b, c) 
serve to reduce 5-fold multiple zeta values to 3-fold alternating Euler sums. 
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525990827847624469523748125835264000 S(20) = 

- 15024402006639545347476341466358480896000 ir M(17, 2) + 614357286926025380403737810975588352000 tt M(15, 4) 

- 33663412982247966049519880053456896000 tt M(13, 6) + 204785762308675126801245936991862784000 ir M(15, 2) C(2) 

- 11221137660749322016506626684485632000 tt M(13, 4) C(2) - 7792456708853695844796268530892800000 tt M(13, 2) C(4) 
+ 65832426829545801661197345390290033253800417 C(20) - 1655150248639886171642409815524246277640960 C(17, 3) 

- 87407857867972646063318792204545819545600 C(17) C(3) + 239001490518032437117759318070284363571904 C(15, 5) 

+ 6475497072134876357497140759587182503936 C(15, 3) C(2) - 11343388910891633971745524946475581811513600 C(15) C(5) 
+ 76505310594054968968541596301477435326464 C(15) C(3) C(2) - 5427506872793330621343984298741119861120 C(14) ( 2 (3) 

- 33725186900885181072542216636542494977280 C(13, 7) - 1079236594149043072329862323338197518336 C(13, 5) C(2) 

- 50485931801186079342010895425290633062400 C(13, 3) C(4) - 24430610879956273104022963748303711510447040 C(13) C(7) 

+ 796530831594947602965411064203762718396416 C(13) C(5) C(2) - 48476322702940293939397763722185147340800 C(13) C(4) C(3) 

- 2459446142542578280833853163647795200 C(12) C(5, 3) - 7183917419981873615355846546110107008000 C(12) C(5) C(3) 
+ 6554036738326690659991123688262156748800 C(H, 5) C(4) - 674581129238392279111385274785342054400 C(H, 3, 3, 3) 

+ 155743130140661296228413518954716262400 C(H, 3, 3) C(3) + 13856996845301527891423305382301558784000 C(H, 3) C(6) 
+ 339959536740516778440799419126460108800 ((11, 3) C 2 (3) - 35543027806069609369237745997797431835122560 C(H) C(9) 
+ 1912599458053045671374932296869893271531520 C(H) C(7) C(2) - 8624509220693012537969847600322793702400 C(H) C(6) C(3) 

- 159424648200337153322748394462255349760000 C(H) C(5) C(4) - 386372041666595966843560058208603955200 C(H) C 3 (3) 

- 4526144521471219675886040639917260800 C(10) ((7, 3) - 4235684121072319605030836248657970626560 C(10) ((7) C(3) 

- 4274427562442524135198151261132645652480 C(10) C 2 (5) + 174910231480430088343102177690512998400 C(9, 5, 3, 3) 
+ 23201851844071266080584141499247820800 C(9, 5, 3) C(3) - 5388965272775430297200443154254448394240 C(9, 5) C(6) 

- 96144480802344282256962346694615654400 C(9, 5) C 2 (3) - 564799665543005814719751486159037931520 C(9, 3, 5, 3) 

+ 192405432086205157974414874044727296000 C(9, 3, 3, 3) C(2) - 437636171132005416578131168531552665600 C(9, 3, 3) C(5) 
+ 82410232260928579141238186701396377600 C(9, 3, 3) C(3) C(2) - 56820309831551194167334052913378508800 C(9, 3) C(8) 

- 78290750182007491017999587160883200 C(9, 3) C(5, 3) + 173223299338939829293781467642177536000 C(9, 3) C(5) C(3) 

- 5389461879726322601463723747508224000 C(9, 3) C 2 (3) C(2) + 1395360857314382550903663041050280719202304 C 2 (9) C(2) 

- 1543454230261900138881951172107169382400 C(9) C(8) C(3) - 511939532590839950285975762448130830336000 C(9) ((7) C(4) 
+ 89785104680812821069278191239404195328000 C(9) C(6) C(5) - 1309132727087420901925773113189990400 C(9) C(5, 3) C(3) 

+ 1731994708600523066371520212640192716800 C(9) C(5) C 2 (3) - 1309132727087420901925773113189990400 C(9) C(3, 5, 3) 
+ 49863866344508636305947249931911168000 C(9) C 3 (3) C(2) - 13797482183512283560940162429818580121600 C(8) ((7) C(5) 
+ 20525248296522064841059215485763379200 C(8) C 4 (3) + 533245759266957435712480647773027635200 C(7, 7, 3, 3) 
+ 39157503832984121716572521716488652800 C(7, 7, 3) C(3) + 223377519430349618539918571265503416320 ((7, 5, 5, 3) 

- 23700768289019448234404348103552000000 ((7, 5, 5) f (3) - 184392479550115407127835175133101465600 ((7, 5, 3, 5) 

- 192646077208687087875906081369587712000 C(7, 5, 3, 3) C(2) + 609805989326475901096307023020578611200 ((7, 5, 3) C(5) 

- 92883012339775157113672910718900633600 C(7, 5, 3) C(3) C(2) - 1083004232781819170351004903486259200 C(7, 3, 5, 3) C(2) 
+ 642228810086780199757027863429120000 ((7, 3, 3) C(7) + 1715165074541342577478967041720320000 C(7, 3, 3) C(5) C(2) 

- 256615593289239779065106729533440000 C 2 (7, 3) - 117611319397120633884272025717964800 C(7, 3) C(5, 3) C(2) 

+ 266618082812610684431891803668480000 ((7, 3) C 2 (5) + 68054784737327925282389654900302533120000 C 2 (7) C(6) 
+ 1571090732393362601235892759430587238400 C 2 (7) C 2 (3) - 1016434130097344129482122765187153920 ((7) C(5, 5, 3) 

- 293590313182528091317498451853312000 C(7) C(5, 3) C(5) + 1133345755060987206065174842254330470400 C(7) C 2 (5) C(3) 

+ 62419983317149231400720825830146048000 C(7) C(5) C 2 (3) C(2) + 101029201230288166627783621503025152000 C(7) C(4) C 3 (3) 

- 3927398181262262705777319339569971200 C(6) C(5, 3, 3, 3) + 88122944806249232884839801748733952000 C(6) C(5) C 3 (3) 

- 1963699090631131352888659669784985600 ((d) C(3, 5, 3) C(3) + 1239443914180202003982888789718597632 C(5, 5, 5, 3) C(2) 

- 2714534591307431519290045171362693120 C(5, 5, 3) C(5) C(2) - 7854796362524525411554638679139942400 C(5, 3, 3, 3, 3, 3) 
+ 7854796362524525411554638679139942400 C(5, 3, 3, 3, 3) C(3) - 3927398181262262705777319339569971200 C(5, 3, 3, 3) C 2 (3) 

- 117611319397120633884272025717964800 C(5, 3) C 2 (5) C(2) + 327283181771855225481443278297497600 C(5, 3) C 4 (3) 

- 310534753804603441554226729609432166400 C 4 (5) + 46595661441120443976355182952120320000 C 3 (5) C(3) C(2) 

+ 151543801845432249941675432254537728000 C 2 (5) <(4) C 2 (3) - 888340064809321326306774612521779200 C(5) C 5 (3) 

- 654566363543710450962886556594995200 C(3, 5, 3) C 3 (3) + 15584913417707391689592537061785600 C 6 (3) C(2) 

+ 31338860750207444579474396657221632000 C(8) C(9, 3) + 18542546095738616293736744327577600000 C(5) C(3) C(9,3) 
+ 8537710593272460830117524829896704000 C 2 (3) C(2) C(9, 3) + 675871149225360968655980361248931840000 C(4) C(13, 3) 
+ 254015007537749154775389871602030084096 C(2) C(15,3) - 1692980876937872291412185599949615923200 ((17,3) 

- 12361697397159077529157829551718400000 C(5) Ca(9, 3, 3) - 11383614124363281106823366439862272000 C(3) C(2) Ca(9, 3, 3) 
+ 212786017863098254535236772683579392000 C(3) (a(H, 3, 3) + 174238991699437976124847445508096000 ((2) ((6, 5,4, 3) 

+ 65242291818339575848332989300736000 C(8, 5, 4, 3) + 103014144976325646076315246264320000 C(6, 5, 6, 3) 



Table 5: Solution for 5(20) found with the three-level program 
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Parallelization: These three problems were first solved by the second author running 
a three-level implementation of PSLQ on a DecAlpha machine at the Open University, 
with a single 433 MHz processor, and 1 Gbyte of main memory. They were then used 
as benchmarks for a multiprocessor version of the new three-level multi-pair program, 
using the OpenMP programming model, on a 64-CPU SGI Origin-2000 system at the 
Lawrence Berkeley Laboratory. Run times are given in Table 6. Timings on 48 processors 
show a speedup of 19.40 times on the Fibonacci conjecture problem, 22.44 times on the 
-B4 problem, and 17.81 times on the 5(20) problem. Given the challenge of very limited 
concurrency inherent in this type of calculation, we are encouraged by these figures. 





Fibonacci 




B 4 


5(20) 


Processors 


Time 


Speedup 


Time 


Speedup 


Time 


Speedup 


1 


47788 


1.00 


90855 


1.00 


23208 


1.00 


2 


24665 


1.94 


46134 


1.97 


11973 


1.94 


4 


12945 


3.69 


23966 


3.79 


6305 


3.68 


8 


7076 


6.75 


12924 


7.03 


3470 


6.69 


16 


4180 


11.43 


7424 


12.24 


2126 


10.92 


32 


2994 


15.96 


4865 


18.68 


1548 


14.99 


48 


2463 


19.40 


4049 


22.44 


1303 


17.81 



Table 6: Timings for three large problems using the parallel three-level multi-pair program 
9. Conclusion 

We have accelerated the conventional implementation of the PSLQ algorithm in three 
ways. First, we utilized a two- level and a three- level scheme, which permit most if not all 
iterations to be performed using ordinary 64-bit double precision arithmetic, and updat- 
ing the multiprecision arrays only as needed. This resulted in a speedup of up to 65 times 
over the straightforward one-level program. Secondly, we developed a new integer relation 
algorithm, a variant of PSLQ that we have termed the "multi-pair" algorithm. We also 
demonstrated two-level and three-level implementations of this new algorithm. These 
techniques resulted in an additional speedup of up to 22%, comparing the three-level 
multi-pair program to the three-level PSLQ program. Finally, we showed how that this 
new algorithm, unlike PSLQ, is reasonably well suited for parallel processing. We demon- 
strated a parallel three-level implementation of the multi-pair algorithm that achieved an 
additional speedup of up to 22 times. 

In consequence, we are able to discover in days relations that would previously have 
taken years to unveil. We applied the programs to three large problems, obtaining results 
not previously known in the literature. We believe that these demonstrations open up a 
novel way of doing science. We are confident that many more discoveries can be made in 
this manner. 
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